GENCODE: the reference human genome annotation for The ENCODE Project.

نویسندگان

  • Jennifer Harrow
  • Adam Frankish
  • Jose M Gonzalez
  • Electra Tapanari
  • Mark Diekhans
  • Felix Kokocinski
  • Bronwen L Aken
  • Daniel Barrell
  • Amonida Zadissa
  • Stephen Searle
  • If Barnes
  • Alexandra Bignell
  • Veronika Boychenko
  • Toby Hunt
  • Mike Kay
  • Gaurab Mukherjee
  • Jeena Rajan
  • Gloria Despacio-Reyes
  • Gary Saunders
  • Charles Steward
  • Rachel Harte
  • Michael Lin
  • Cédric Howald
  • Andrea Tanzer
  • Thomas Derrien
  • Jacqueline Chrast
  • Nathalie Walters
  • Suganthi Balasubramanian
  • Baikang Pei
  • Michael Tress
  • Jose Manuel Rodriguez
  • Iakes Ezkurdia
  • Jeltje van Baren
  • Michael Brent
  • David Haussler
  • Manolis Kellis
  • Alfonso Valencia
  • Alexandre Reymond
  • Mark Gerstein
  • Roderic Guigó
  • Tim J Hubbard
چکیده

The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Functional transcriptomics in the post-ENCODE era.

The last decade has seen tremendous effort committed to the annotation of the human genome sequence, most notably perhaps in the form of the ENCODE project. One of the major findings of ENCODE, and other genome analysis projects, is that the human transcriptome is far larger and more complex than previously thought. This complexity manifests, for example, as alternative splicing within protein-...

متن کامل

Improving GENCODE reference gene annotation using a high-stringency proteogenomics workflow

Complete annotation of the human genome is indispensable for medical research. The GENCODE consortium strives to provide this, augmenting computational and experimental evidence with manual annotation. The rapidly developing field of proteogenomics provides evidence for the translation of genes into proteins and can be used to discover and refine gene models. However, for both the proteomics an...

متن کامل

Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome.

Within the ENCODE Consortium, GENCODE aimed to accurately annotate all protein-coding genes, pseudogenes, and noncoding transcribed loci in the human genome through manual curation and computational methods. Annotated transcript structures were assessed, and less well-supported loci were systematically, experimentally validated. Predicted exon-exon junctions were evaluated by RT-PCR amplificati...

متن کامل

APPRIS: annotation of principal and alternative splice isoforms

Here, we present APPRIS (http://appris.bioinfo.cnio.es), a database that houses annotations of human splice isoforms. APPRIS has been designed to provide value to manual annotations of the human genome by adding reliable protein structural and functional data and information from cross-species conservation. The visual representation of the annotations provided by APPRIS for each gene allows ann...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Genome research

دوره 22 9  شماره 

صفحات  -

تاریخ انتشار 2012